Workflow of Delete and Snapshot Operations in GFS

Let’s learn how GFS carries out delete and snapshot operations.

We'll cover the following

In the previous lessons, we’ve discussed the GFS design approach and the workflow of the create, read, and write operations. This lesson will cover the workflow of two remaining operations—delete file, and snapshot file/directory.

Delete file#

The files on GFS are huge, implying that a file will have several chunks spread across the multiple chunkservers. Moreover, each chunk is replicated on multiple chunkservers for availability. Deleting so many chunks from multiple chunkservers while holding the client's delete request will add substantial latency to the client. If any of the replicas are temporarily down, we have to wait for them to recover to delete the chunk. This will produce an unnecessary wait on the client side. So, the file system implements a service called garbage collection. This service deletes the chunks but responds to the client immediately after marking the deleted file in the namespace tree maintained at the manager node. This is shown in the following illustration.

Created with Fabric.js 3.6.6
The system consists of a client, a manager node, and the chunkservers

1 of 10

Created with Fabric.js 3.6.6
The client requests the manager to delete file_1

2 of 10

Created with Fabric.js 3.6.6
The manager marks the deleted file by adding a "deleted" marker in the file name. Note that the file and chunk's metadata and data are still there

3 of 10

Created with Fabric.js 3.6.6
After marking the deleted file, the manager responds to the client that the file has been deleted

4 of 10

Created with Fabric.js 3.6.6
The client wants to read the same file

5 of 10

Created with Fabric.js 3.6.6
The manager returns an error message to the client that the file is not found because the file was marked as deleted. However, we have all data for that file

6 of 10

Created with Fabric.js 3.6.6
The manager runs a garbage collection process regularly to clean the metadata and the data for the files that have been marked deleted

7 of 10

Created with Fabric.js 3.6.6
The garbage collector identifies the files that have been deleted and have been there for 30 days, for instance (before which the client can recover the file), and deletes the metadata

8 of 10

Created with Fabric.js 3.6.6
The chunkservers inform the manager about its resident chunks via the heartbeat messages

9 of 10

Created with Fabric.js 3.6.6
The manager is unable to find the chunk handles in the metadata, and replies the same to the chunkservers; the chunkservers then delete those chunks

10 of 10

The garbage collection service regularly scans the namespace on the manager node to find out the files that have been marked deleted and deletes the metadata for such files. The chunkservers share the chunk handles for the chunks that they hold with the manager through heartbeat messages regularly. If the manager doesn't find a chunk handle in the metadata, it informs the chunkserver about it, and the chunkservers delete such chunks.

Snapshot file/directory#

A snapshot operation creates a copy of a file or a directory tree almost immediately and at a low cost. A copy of the file means we have a copy of all of its data chunks. Consider if someone performed a write operation on a file while a snapshot was also being taken then the snapshot will result in an incorrect copy of the data. The concurrent snapshot operations might result in inconsistent data among the copies of the same file. We need to stop all the write operations for the file/directory to perform the snapshot operation. We also have to ensure that the source and the destination directories are not deleted while the snapshot operation is in progress.

To be careful with the above problems, we need to temporarily disallow some operations on the directories or the files involved in the snapshot operation (via locks). For example, if we are taking a snapshot of a file with the full path /dir_src/file_1 and we are saving the snapshot at path /dir_des/file_1, we have to acquire the read locks on source and destination directories and write locks on the file to perform the snapshot operation both at the source and the destination. This is illustrated below.

Created with Fabric.js 3.6.6
There is a client, manager node, and the chunkservers; the manager contains metadata while the chunkservers contain the chunks data

1 of 15

Created with Fabric.js 3.6.6
The client requests the manager to snapshot "/dir_src/file_1" and place it in "/dir_des"

2 of 15

Created with Fabric.js 3.6.6
The manager acquires a read lock on the source and the destination directory, "/dir_src" and "/dir_des"

3 of 15

Created with Fabric.js 3.6.6
The manager acquires a write lock on the file being snapshotted "/dir_src/file_1", revokes leases (not shown here)

4 of 15

Created with Fabric.js 3.6.6
The manager also acquires a write lock on the file name "dir_des/file_1" to stop the creation of the same file name by other clients in the destination directory

5 of 15

Created with Fabric.js 3.6.6
The manager stops all mutations on the "/dir_src/file_1" and creates a copy of its metadata. Chunk data is not duplicated since both copies refer to the same chunks

6 of 15

Created with Fabric.js 3.6.6
The manager responds to the client that the snapshot operations is performed successfully

7 of 15

Created with Fabric.js 3.6.6
The client asks the manager for the chunk handle to write data to the "/dir_src/file_1" at a given offset

8 of 15

Created with Fabric.js 3.6.6
Based on the offset, the manager finds the corresponding chunk handle. Let’s take 2bef as an example

9 of 15

Created with Fabric.js 3.6.6
The manager sees that the reference count for chunk 2bef (shown in red) is greater than one. In the example above, the reference count is 2

10 of 15

Created with Fabric.js 3.6.6
The manager generates a new chunk ID to duplicate "/dir_des/file_1"'s 2bef's chunk data because that data is going to be changed in "/dir-src/file_1"

11 of 15

Created with Fabric.js 3.6.6
The chunkservers holding data in the newly created chunk copies data from the 2bef

12 of 15

Created with Fabric.js 3.6.6
Once the manager has created a separate chunk for the "/dir_des/file_1", it replies to the client with a chunk handle, replicas, as well as a lease for writing data to "dir_src/fil_1"

13 of 15

Created with Fabric.js 3.6.6
The client pushes the data to the replicas of 2bef, and asks them to write data "n"

14 of 15

Created with Fabric.js 3.6.6
The replicas perform the write and respond to the client with success

15 of 15

The read lock on the, /dir_src and /dir_des directories stops others from deleting or renaming them. The manager node in the GFS makes sure that these directories are not deleted or renamed in the namespace tree if a snapshot operation on these directories is in progress.

The write lock on the file /dir_src/file_1 file prevents any mutations on the file if the snapshot operation is in progress. The manager node in the GFS does this by revoking the leases for the chunks that belong to the file undertaking the snapshot operation. The replicas holding the lease complete the mutation in progress and release the lease. For the next mutations, the client needs to contact the manager for the chunk lease, and the manager will hold the client until the snapshot operation is complete. Write the lock on the file name /dir_des/file_1 stops the creation of the similar file name in the destination directory.

Note: Clients can have multiple read locks, while write locks are exclusive. Write locks can’t be acquired if there is already a read or write lock on the resource.

Point to ponder

Question

Do we need to instantaneously duplicate the data for all chunks for the file that is undertaking a snapshot operation?

Hide Answer

We don’t need to duplicate the data unless it has changed. The manager node in the GFS duplicates the metadata for the snapshotted file. In this case, a chunk handle in the metadata will be pointing to two different files (more than two if more snapshots of the same file are created).

Suppose a client requests a write operation on the chunk that is pointing to more than one file in the metadata. In that case, the manager first generates a new chunk handle, duplicates the chunk data for it, and then performs the write operation on the chunk associated with the requested operation in the metadata.

The mechanism above is called copy-on-write (COW). This idea is borrowed from Linux’s fork system call.

Workflow of Write Operations in GFS

Relaxed Data Consistency Model